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Biochemical and regulatory interactions central to biological net- 
works are expected to cause extensive genetic interactions or 
epistasis affecting the heritability of complex traits and the dis- 
tribution of genotypes in populations. However, the inference 
of epistasis from the observed phenotype-genotype correlation 
is impeded by statistical difficulties, while the theoretical under- 
standing of the effects of epistasis remains limited, in turn limiting 
our ability to interpret data. Of particular interest is the biologi- 
cally relevant situation of numerous interacting genetic loci with 
small individual contributions to fitness. Here, we present a com- 
putational model of selection dynamics involving many epistatic 
loci in a recombining population. We demonstrate that a large 
number of polymorphic interacting loci can, despite frequent re- 
combination, exhibit cooperative behavior that locks alleles into 
favorable genotypes leading to a population consisting of a set of 
competing clones. When the recombination rate exceeds a cer- 
tain critical value that depends on the strength of epistasis, this 
"genotype selection" regime disappears in an abrupt transition, 
giving way to "allele selection"-the regime where different loci are 
only weakly correlated as expected in sexually reproducing pop- 
ulations. We show that large populations attain highest fitness at 
a recombination rate just below critical. Clustering of interacting 
sets of genes on a chromosome leads to the emergence of an in- 
termediate regime, where blocks of cooperating alleles lock into 
genetic modules. These haplotype blocks disappear in a second 
transition to pure allele selection. Our results demonstrate that 
the collective effect of many weak epistatic interactions can have 
dramatic effects on the population structure. 

Selection acting on genetic polymorphisms in populations is a ma- 
jor force of evolution ( 1 ; 2, 3, 4) and it is possible to identify 
specific loci under positive selection (e.g. the Adh locus in Drosophila 
Qj). Yet, the attribution of fitness differentials to specific allelic vari- 
ants and combinations remains a great challenge (5i. Efforts to cor- 
relate quantitative phenotypes with genetic polymorphisms typically 
identify a small number of loci with a significant contribution to the 
observed phenotypic variance, but leave much of the variance unac- 
counted for (;6(. This unaccounted variance is believed to arise from 
a large number of loci with small individual contributions, or be due 
to epistasis and quite likely involves both effects. New studies ac- 
cumulate evidence that epistasis is widespread and accounts for a 
significant fraction of phenotypic variation (e.g. in yeast ((T) [8l O). 
Additional evidence for epistasis comes from crosses of mildly di- 
verged strains, where the recombinant progeny often has reduced av- 
erage fitness, i.e. display outbreeding depression. The reduction 
in fitness is attributed to the breakdown of favorable combination of 
alleles in the ancestral strains dlOt . Outbreeding depression is often 
observed in partly selfing organisms such as C. elegans illl or plants 
J12I I. species with strong geographic isolation such copepod ( il3> or 
facultatively mating organisms such as yeast 1 14i. While most recom- 
binant genotypes are less fit, novel genotypes that perform better than 
either parental strain can be generated as well ( il5b . Such outcrossing 



events could play an important role in evolution. 

Competition between epistatic selection and recombination, ex- 
plicit in the outbreeding depression phenomenon, is the focus of the 
present study. In the presence of epistasis, selection, by increasing the 
frequency of favorable genotypes, establishes correlations between 
alleles at different loci. Recombination on the other hand reshuffles 
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Fig. 1 . The two regimes of sexual reproduction. Panels a & b show the sim- 
ulated time course of the genotype distribution In a population of 500 individuals 
with epistatic fitness variance Vj = a"^ = 0.005 and the outcrossing rate 
r = 0.1 (a) and r = 0.4 (b). Like genotypes are assigned the same color and 
stacked on top of each other. Sketches Illustrating the population dynamics in the 
two cases are shown as Insets in panel c. At low outcrossing rates, fit genotypes 
can proliferate. The genotype distribution rapidly coarsens and clones form (hori- 
zontal stripes in panel a). With frequent outcrossing, genes are rapidly reshuffled 
and genotypes do not persist over many generations, resulting In the polntilllst 
pattern In panel b. Fixation happens at later time and is not shown. Panel c: The 
two regimes are separated by a sharp boundary set by the strength of epistasis. 
For r < Tc, the population dynamics Is described by clonal competition (CC); for 
r > rc by quasi linkage equilibrium (OLE). 
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alleles and randomizes genotypes breaking up coadapted loci. Be- 
cause recombination rate between any two loci is largely determined 
by their physical distance on the chromosome, the effect of genetic 
interactions depends on gene location. It is laiown that function- 
ally related genes tend to cluster (16; 17), suggesting selection on 
gene order. Furthermore, chromosomes have regions of infrequent 
recombination, interspersed with recombination hotspots (18 1. Does 
selection have a hand in defining low recombination regions? To un- 
derstand how evolution shaped genomes as we observe them today, we 
have to tackle the problem of how selection acts on many interacting 
polymorphisms for a large range of recombination rates ( 19i. 

Standing variation harbored in natural population provides impor- 
tant raw material for selection to act upon, in particular after a sudden 
change in environments or hybridization events (20 1. In such a sit- 
uation, selection will reduce genetic variation until a new mutation- 
selection equilibrium is reached. Here, we show that the selection dy- 
namics on standing variation at a large number of loci can be strongly 
affected by epistasis, even if the individual contribution of each locus 
is small. The competition between selection on epistasis and recombi- 
nation gives rise to two distinct regimes at high and low recombination 
rates separated by a sharp transition. The population dynamics in the 
two regimes is illustrated in Fig. [T^,b: i) the "clonal competition" 
(CC) regime which occurs for recombination rates r < rc and ii) 
the Quasi Linkage Equilibrium (QLE) regime for r > Tc- The dif- 
ferent nature of the two regimes is best understood by considering 
the limiting cases of no and frequent recombination. In the case of 
purely asexual reproduction, selection operates on entire genotypes 
and results in clonal expansion of the fitter ones. The genetic varia- 
tion present in the initial population is lost on a timescale inversely 
proportional to the average magnitude of fitness differentials between 
genotypes present in the population. Successful genotypes persist in 
time, which is apparent as continuous broad stripes of one color in 
Fig. [T^. The amplification of a small number of fit genotypes in- 
duces strong correlations or linkage disequilibrium among loci. In 
presence of epistasis, a little recombination does not change this pic- 
ture qualitatively, as most recombinant genotypes are less fit than the 
prevailing clones and novel successful clones are rare. Nevertheless 
recombination is very important because it continuously introduces 
new genotypes leading to an increase in fitness attained by the pop- 
ulation at long times. In the limit of high recombination genotypes 
are short-lived and essentially unique, resulting in a "pointillist" color 
pattern in Fig. [T]5. Each allelic variant is therefore selected on the 
basis of its effect on fitness, averaged over many possible genetic 
backgrounds. The time scale on which allele frequencies change is 
given by the inverse of these marginal fitness effects. The term "link- 
age equilibrium" in QLE refers to the negligible correlations between 
loci, which are constantly reshuffled by recombination. 

As we show below, the transition between the two regimes sharp- 
ens as the number of segregating loci L increases. The sharpening of 
the transition is related to the different scaling of the time scale of se- 
lection in the two regimes. For large L, the marginal fitness effects of 
individual loci become small compared to fitness differentials among 
individuals (assuming they are all of similar size, this ratio decreases 
as ~ \/\fL). Hence, the dynamics in the QLE regime slows down 
compared to the CC regime as L increaes. The CC and QLE regimes 
correspond to different regions of the parameters space spanned by 
the relative strength of epistasis and the ratio of outcrossing or re- 
combination rate to the strength of selection, as sketched in Fig. [TJ;. 
The QLE dynamics was first described by Kimura ( 121b in the limit 
of weak selection/fast recombination for a pair of bi-allelic loci and 
subsequently generalized to multi-loci systems ( 1221 123b . The pos- 



sibility of a collective behavior involving linkage disequilibrium on 
many loci and selection effectively acting on the whole chromosome 
as a unit has been pointed out before in the context of overdominance 
by Franklin and Lewontin ( 24 ) in the strong selection limit. However, 
these studies of the two different limits do not reveal the breakdown 
of QLE and the transition to CC as the generic behavior of multi-locus 
epistatic systems. 

To underscore the general nature of the results, we shall consider 
two different models of epistasis. The first model will follow the 
common treatment of epistasis in quantitative traits which assumes 
that the epistatic contribution to fitness is disrupted when the parental 
genes are mixed in sexual reproduction ( 1251 126b . This assumption 
becomes exact when the epistatic component of fitness of a specific 
genotype is a random number (which depends on the genotype, but is 
fixed in time) and we shall call this model the random epistasis (RE) 
model. Within the RE model, any change in the genotype randomizes 
the epistatic component of fitness so that the latter is not heritable 
when non-identical parents mate. It is, however, faithfully passed on 
to the offspring in asexual reproduction. For the RE model, genomes 
are propagated asexually with probability 1 — r and with probability r 
are a product of mating where all genes are reasserted, as would be ex- 
actly correct if all genes were on different chromosomes. This model 
of facultative mating approximates reproductive strategies common 
in fungi (e.g. yeast) or nematodes and plants. As a more realistic al- 
ternative, we shall also study a model with only pairwise interactions 
between loci ( 127b . This pairwise epistasis (PE) model allows epistatic 
contribution to be partly heritable, as interacting pairs have a chance 
to be inherited together (28). For the PE model, we assume that all 
genes are arranged on one chromosome with a uniform crossover rate 
p, which allows us to explore haplotype block formation and impli- 
cations for recombination rate evolution. 

The strength of selection is determined by the variance a'^ of the 
distribution of fitness in the population. Within our models, the fit- 
ness F{g) of a genotype g is the sum of an additive component A(g) 
representing independent contributions of alleles and an epistatic part 
E{g). For the RE model, the latter is a random number drawn from 
Gaussian distribution, while for the PE model it is a sum of pairwise 
interactions with random coefficients fij . The variances Va and Vi 
of the distributions of A{g) and E{g) add up to and their relative 
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Fig. 2. The clonal competition regime Is ctiaracterized by extensive linkage 
disequilibrium, a Random epistasis model: For small r, the LD per locus pair Is 
of order one and fairly independent of L. The Inset shows the data for L = 100 
on a logarithmic scale and a mark at the value of r^. The LD for r > r-c Is due to 
sampling noise, see Figure SI. b Pairwise epistasis model: For pairwise epista- 
sis, the QLE approximation gives explicit predictions for LD, which describes the 
observed LD very accurately for p > pc, black line. For p < pc, LD is a much 
larger than the QLE prediction. For both panels, LD Is measured when allelic 
entropy has decayed 30% from the Initial value (o-^ = 0.005, Va = O.Ict^ and 
Vj = 0.90-2). |p panel a, N = 10^ and the data shown is averaged over 100 
realizations. To avoid boundary and finite size effects, we used N = 10^ and a 
circular chromosome for panel b and averaged over 10 realizations. 
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magnitude determines thie importance of additive effects compared to 
epistasis. Tiie two different models and tlieir parameters are given ex- 
plicitly in the methods section. For the sake of simplicity, we assume 
haploid genomes. Random and pairwise epistasis represent two oppo- 
site extremes in the complexity of epistasis. While the pairwise model 
is more realistic, the generic behavior is most clearly demonstrated 
using the RE model with random gene reassortment and facultative 
mating. 

Results 

Two regimes of selection dynamics. We performed extensive com- 
puter simulation of our two models for different relative strength of 
epistasis, L — 25 — 200 loci and populations sizes between TV = 500 
and 10®. We initialize simulations in a genetically diverse state as 
would result from multiple crossings of two diverged strains and ex- 
amine the evolution under selection and recombination. The two 
regimes differ strongly in the amount of linkage disequilibrium (LD) 
(see Methods) build up by selection. Panel a of Fig. [2]shows the aver- 
age LD per locus pair for the RE model as a function of the outcrossing 
rate r. For r < rc, the LD per locus pair is of order one and indepen- 
dent of L or A'^, indicating genome-wide LD. LD builds up despite a 
large number of different genotypes in the population interbreeding 
constantly. For r > r^, the LD is much smaller, with the observed 
value determined by the sampling noise due to the finite population 
size (see inset of Fig 2a and supplementary Figure SI). Similar be- 
havior occurs in the PE model, as shown in panel b. Above a critical 
recombination rate pc, the observed linkage disequilibrium is time 
independent and well described by the QLE approximation ( 1211 122I I 
(straight line, see supplement). The QLE approximation (in the high 
p/a limit) predicts LD to be proportional to the strength of pairwise 
epistasis Below pc, the observed LD is dramatically larger than the 
QLE expectation. Here, recombination is sufficiently infrequent such 
that genotypes with a synergistic alleles are amplified faster than they 
are taken apart by recombination, see below. As a result, the few 
fittest genotypes grow exponentially in number, leading to the strong 
correlation in the occurrence of cooperating alleles, independent of 
physical linkage (i.e. proximity on the chromosome). This extensive 
LD leads to a complete failure when extrapolating results valid in the 
high recombination regime across the transition. The relevant quan- 
tity that determines whether fit genotypes can be maintained is the 
probability that no crossover occurs, which is given by e^''^. Hence, 
Pc is inversely proportional to L. 

Self -consistency condition for QLE. The fitness of a genotype can 
be decomposed sls F = A + E, where A is the heritable additive part 
and E is the non-heritable epistatic part. As a coarse-grained descrip- 
tor of the population, we consider the joint distribution P{A, E; t) of 
the fitness components. In the QLE state, P{A, E; t) evolves approx- 
imately as 

dtP{A, E-t) = {F-F- r)P{A, E; t) + rp{E)d{A- 1) [1] 

The first term accounts for the exponential growth of genotypes with 
fitness advantage F ~ F and the loss due to recombination at rate 
r. The second term accounts for the production of genotypes through 
recombination. To a good approximation, the distribution of A among 
recombinant offspring is identical to that among the parents i9(y4) = 
J dE P{A, E), which in turn is approximately Gaussian (29ll- The 
distribution of E among recombinant offspring is independent of the 
parents and a random sample from the distribution of epistatic fitness 
p{E), which in our models is a zero-centered Gaussian. The latter 
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Fig. 3. The break-down of QLE. Panel a: When the recombination rate de- 
creases below rv, some individuals have epistatic fitness E larger than E + r, 
and the QLE solution for the distribution of epistatic fitness in the population breaks 
down. Individuals to the right of E+r form clones that grow exponentially and the 
population condenses into a small number of genotypes. Panel b: For r > 
even the largest epistatic fitness contributions do not result in a growth advantage 
that exceeds the recombination rate. 



is exactly true for the RE model and holds approximately for the PE 
model, where the correlation of E between ancestor and offspring 
halves every generation ilSl . Eq. ^ admits the factorized solution 
P{A, E- 1) = i9(y4; t)uj{E) with dt^{A; t) ^ {A - A)^{A- 1) and 
a time-independent distribution of E 



io{E) = 



rp{E) 

+ E-E' 



[2] 



where E is determined by the condition that ijj{E) has to be nor- 
malized. This solution exists only if i5 < r -f i5 for all genotypes; 
otherwise, fit genotypes escape recombination and grow as clones. 
These two scenarios are illustrated in Fig. [3] 

The normalization condition can be fulfilled only if r is larger 
than some The value of rc is proportional to the maximal E 
and hence proportional to the strength of epistasis \/V^- However, 
it is not the absolute maximum of E among all possible 2^ geno- 
types that determines rc, but the maximal E that is encountered by 
the population before fixation. Hence rc depends on the population 
size and the functional form of this dependence is determined by the 
upper tail of the distribution p{E). For the Gaussian distribution used 
here, rc ~ y^Iii(riVr), where r is the time scale of QLE dynamics 
discussed below.. The product rNr then is the number of genotypes 
generated through recombination before fixation. A more detailed 
discussion is given in the Supplementary information. 

The breakdown of the QLE state has some similarity to the error- 
threshold transition of a quasi-species model (30) in a rugged fit- 
ness landscape ( 31 1: Recombination of epistatic loci acts as deleteri- 
ous mutations and prevents the emergence of quasi-species or clones 
([32l[33j for r > rc. 

Maintenance of genetic diversity. The transition between the two 
regimes leaves its imprint in virtually every quantity of interest in 
population genetics. For instance, the characteristic time for the de- 
cay of genetic diversity, r, (which we quantify via allele entropy, see 
Methods) scales differently with L in the two regimes, as shown in 
Fig. [4] At low outcrossing rates, r depends only on the total variance 
in fitness and neither on the number of loci nor the relative strength 
of additive contributions. This is consistent with the notion that in 
the CC regime genotypes are the units on which selection acts. With 
more frequent outcrossing, r tends to be larger for weak additive 



p(-E) has to go to zero faster than linear for rc to exist. 
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Fig. 4. Panel a shows the time r it tal^es to reduce the allelic entropy by 30% 
as a function of r for different parameters. For small r, r is independent of L 
but increases with r. For r > r^, t settles at c^L/Va (black diamonds) in 
accord with the theoretical prediction for single locus dynamics (with c a fitting 
parameter). Additional data for = 0,Va = 0.5o-^ and a collapse confirming 
the scaling of r is shown in supplementary figure S2. Panel b: The fitness of the 
fixated genotype Ff^^^\ as a function of r for two different strength of epistasis. At 
r = 0, the final fitness only depends on the population size and is independent 
of the strength of epistasis. Ffin^i increases with r, followed by a pronounced 
drop right below Tc- Above rc, fflnai is almost constant and is independent of 
A'". In both panels, o-^ = 0.005. Data is averaged over 25 realizations in panel 
a and over 100 realizations in panel b. L = 100 in panel b. 



contributions and large L. Beyond a certain outcrossing rate rc, r be- 
comes independent of r attaining a value inversely proportional to the 
additive contribution of the individual loci independent of Vi (black 
diamonds in Fig 3a). This observation confirms our assertion that for 
r > Tc, outcrossing decouples the loci and that the allele frequencies 
evolve independently under the action of the additive component of 
fitness. Given an additive variance Va, the typical single locus fitness 
differential is / ~ ^JVa/L such that t grows as -^/Z for r > rc. To 
uncover the universal behavior in the vicinity of the transition in the 
limit of large genomes, we show that the data for different Vi, Va 
and L collapses onto a single master curve after appropriate rescal- 
ing of the axis, see Fig. S2. This scaling collapse demonstrates the 
existence of a sharp transition in the limit L ^ oo, the scaling of 
r with \fL and shows that Tc is proportional to \/Vi, as expected 
from the self-consistency argument outlined above and sketched in 
Fig. [ij;. The suppression of allele dynamics by in the QLE 

regime is at the basis of Fisher's infinitesimal model put forward to ex- 
plain sustained response to selection ((5). In one generation, the allele 
frequencies change by approximately /, which can be sustained over 
~ /^^ generations. The mean fitness increases by Va per genera- 
tion, consistent with Fisher's theorem ( 23 ; 25 1. Our results show, that 
epistasis causes the breakdown of the infinitesimal model for r < rc- 
The pairwise epistasis model is more complex than the random epis- 
tasis model, since the partition of the fitness variance in additive and 
epistatic contribution depends on the allele frequencies and epistasis is 
"converted" into additive fitness as the population approaches fixation 
04t (a detailed account will be published elsewhere). 

The properties of the genotype which will eventually fixate in the 
population depends on the regime in which it was obtained. We find, 
that the fitness of this fixated genotype depends non-monotonically 
on the outcrossing rate and peaks just below the transition, see Fig. [4] 
This can be understood as follows. Without recombination, the final 
state can be no fitter than the fittest genotype initially present. With 
some recombination, the population explores a greater number geno- 
types, potentially finding ones with higher fitness so that the fitness of 
final state increases with r in the CC regime. A similar benefit of in- 
frequent recombination due to exploration of genotype space has been 
studied in the context of virus evolution for additive fitness functions 



OSt . As genotype selection gives way to allele selection, different 
loci decouple and the epistatic contribution to fitness is missed, lead- 
ing to possible fixation of less fit genotypes and a sharp drop of the 
final fitness r approaches r^. The dependence of the final fitness on 
the population size A'^ highlight the distinct properties the dynamics 
in the two regimes: In the QLE regime, the final fitness is virtually 
identical for different A'^. This is a consequence of the fact that the 
relevant dynamical variables are allele frequencies, which are well 
sampled by 0{N) individuals. Fluctuation of the allele frequencies 
are therefore negligible and the dynamics is essentially determinis- 
tic. This is different in the CC regime, where the dynamics is driven 
by the generation of rare, exceptionally fit genotypes. The rate, at 
which genotypes are generated is proportional to the N, resulting in 
a pronounced dependence on the population size. QLE ceases to be 
deterministic once the marginal fitness effects become comparable to 
inverse population size and random genetic drifts overwhelms selec- 
tion, see Fig. S3 in the Supplementary information. 

Selection on genetic modules. So far, we assumed that each pair 
of loci is equally likely to interact epistatically, regardless of their 
physical distance on the chromosome. However, there is evidence 
that the order of genes along the chromosome is far from random and 
that related genes tend to cluster I I16I [TTl l. To emulate such a situ- 
ation we use the PE model and construct an interaction matrix fij 
where arbitrary pairs interact with a small probability while clusters 
of neighboring genes interact with a high probability (see Methods). 
For such a hierarchical epistatic structure, we observe, as a function 
of increasing crossover rate p, a sequence of two transitions which 
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Fig. 5. Clonal competition, modular selection and quasi linkage equilibrium. 
Above the diagonal, panels a, b and c show the LD measured as D[- between 
two loci i and j for a linear chromosome of length L = 100 at three different 
crossover rates p. Below the diagonal, the interaction matrix /,., is shown (the 
same in all three panels). At low p, the sparse long range interactions suffice 
to produce genome wide LD. At intermediate p, distant part of the genome are 
decoupled, but the more strongly interacting clusters still show high LD, which 
vanishes at even higher recombination rates. Panel d, top: the distribution of 
historic crossovers. Bottom: The relative fitness of recombinants as a function of 
the crossover location. LD was measured when allelic entropy was at 90% of the 
initial value, a'^ = Vi = 0.005 and = 10*^. 
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define, sandwiched between CC and QLE, an intermediate Modular 
Selection (MS) regime, where the genome-wide LD characteristic of 
the CC regime has broken down to a set of modular blocks which are in 
quasi linkage equilibrium with each other. The resulting linkage dise- 
quilibrium patterns are shown in Fig. |5] The observed block structure 
of LD in the MS regime resembles haplotype blocks (18 ; 19), which 
are normally associated with regions of little recombination flanked by 
recombination hotspots. Indeed, the cumulative recombination his- 
tory of the chromosomes in the population show a very heterogenous 
recombination distribution, as shown in panel d of Fig. |5] Yet, here 
the origin of these blocks is not intrinsically low recombination (i.e. 
physical linkage) but the collective effect of epistatic selection: the 
surviving individuals have recombined more often in regions of low 
epistasis than in regions of high epistasis, even though the attempted 
crossovers are uniformly distributed along the chromosome. Clusters 
of epistatic interaction can therefore exert selective pressure to lower 
recombination within the cluster. This lack of recombinant survival 
has been observed in experiments with mice ( 36 1, where inbreeding 
results in strong selective pressure on localized clusters of genes gen- 
erating blocks with high LD and reduced effective recombination. 

Conclusion 

To summarize, we have shown that the competition of epistatic selec- 
tion and recombination can give rise to distinct regimes of population 
dynamics, separated by a transition that becomes sharp for large num- 
ber of interacting loci. The QLE and CC regimes are realizations of 
the opposing views on evolution of R.A. Fisher and S. Wright. For 
r > rc alleles are selected for the their additive contributions while 
selection acts on whole genotypes for r < rc- The fundamental differ- 
ences between these two regimes show up most clearly in the different 
scaling properties of the total LD and the decay time of genetic diver- 
sity. In the low recombination regime, LD is produced independent of 
physical linkage by the collective effect of many interactions. In the 
high recombination regime, LD can be attributed to specific interac- 
tions between pairs of loci and its value, determined by the ratio of the 
interaction strength and the rate of recombination between the loci, 
is small. Our results not only apply to the transition between geno- 
type and allele selection, but also to localized clusters of interacting 
genes on the chromosome. Whenever the epistatic fitness difference 
between different allelic compositions of a cluster exceeds the recom- 
bination rate of the cluster, the fittest will amplify exponentially. Since 
such clusters are often small ( I36t (~ Mb) their recombination rates 
are low (cM or less) - hence fitness differentials below one percent 
can suffice to establish CC dynamics. Selective pressure to reduce 
recombination load, i.e. the fitness loss through recombination, will 
therefore favor the evolution of clusters of interacting genes and might 
be an important driving force for the evolution of recombination rate 
| I37I [38). The effects described above may provide an explanation 
for the functional clustering associated with low and high LD regions 
reported in HapMap <I8b . 

We would like to thank Michael Elowitz and Marie- Anne Felix for comments 
on the manuscript and acknowledge financial support from NSF grant PHY05- 
51164. 

Methods 

Random epistasis model. A genotype g is described by L binary 
variables Si = ±1, i — 1, . . . , L. To each genotype we assign a 
fitness 

L 

F{9)-fJ2'^+^(9)- [3] 



The first term is the sum of the additive fitness contributions of the in- 
dividual loci, each of which has equal magnitude / = ^/Va/L. The 
second term is the non-heritable epistatic fitness, where ^{g) is drawn 
from a normal distribution with zero mean and variance Vj. For a 
uniform distribution of genotypes, the additive fitness variance is Va, 
the epistatic variance is Vi, and the total variance is = Va + Vi. 

Pairwise epistasis model. Here, we consider epistasis due to pair- 
wise interactions between the different loci. Such pairwise interac- 
tions correspond to SiSj terms in the fitness function. The fitness of a 
particular genotype g is determined by the independent effects of the 
individual loci and the sum of the interactions between all pairs. 

L 

i i<j 

When assuming uniform epistasis between all possible pairs, we draw 
the interaction strength fij from a Gaussian distribution with zero 
mean and variance j^^^ii^ ■ 

Clustered epistasis. To mimic localized clusters of strongly inter- 
acting genes on a weakly interacting background, we constructed the 
matrix of fij 's as follows. The sparse background epistasis was mod- 
eled by assigning each fij a Gaussian random number with probability 
p — 0.1 and zero otherwise. Then we built three epistatic clusters 
with centers Ck = 10,50, 90 by adding a Gaussian random number to 
each fij with probability p — exp ^ ^ with r = 10 

for k = 1,2, 3. All fij were rescaled such that Y2i<j fij ~ ^i- 

Selection. Our model assumes non-overlapping generations. In each 
generation a pool of gametes is produced, to which each individual 
contributes a number of copies of its genome which is drawn from a 
Poisson distribution with parameter exp{F{g) — F). 

Gene re-assortment. To model gene re-assortment in a facultatively 
mating population, two gametes are chosen with probability r and a 
new genotype is formed by assigning each locus the allele of one or 
the other parent at random. Otherwise, the new genotype is an exact 
copy of one gamete. 

Crossovers. Given a crossover rate p per locus, the number of 
crossovers is drawn from a Poisson distribution with parameter 
(L — l)p and the crossover locations are chosen at random. When 
the number of crossovers is zero, the offspring inherits the entire 
genome from one parent. To model circular chromosomes, the num- 
ber of crossovers is multiplied by two enforcing an even number of 
crossovers. 

Measuring genetic diversity. The allele entropy is a con- 
venient descriptor of genetic diversity that is readily calcu- 
lated from the evolving population. It is defined as Sa = 
— "^^i Wi Ini^i + (1 — Vi) ln(l — i^i)], where Vi is the allele fre- 
quency at locus i. 

Measuring linkage disequilibrium. LD is the deviation of the fre- 
quency of a pair of alleles from the random expectation on the basis 
of the individual allele frequencies, i.e. Dij — (siSj) — {si){sj). 
Kimura showed illi that in QLE ipij = ^.^.]^.,^. is time indepen- 
dent despite changing allele frequencies Vi and i/j {Vi = 1 — Vi). To 
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measure genome wide LD, we calculate the sum of all squared LD 
terms X]i<j "^ij- Pairs with Ui or Uj smaller than 0.01 or larger than 
0.99 were omitted. A different normalization is used in Fig. [5] where 

D', = -. ; — —, [iZizI — — __ — ^ is shown, see Ref. il9\ 

for a recent review. 
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